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Abstract. Wc give an example of a zero-sum stochastic game with four states, compact action sets 
for each player, and continuous payoff and transition functions, such that the discounted value does not 
converge as the discount factor tends to 0, and the value of the n— stage game does not converge as n 
goes to infinity. 



1. Introduction 

Two person zero-sum stochastic games have been widely studied since Shapley introduced them in [55] . 
They model interactions repeated in discrete time between two players with opposite interests. The state 
of nature evolves as a function of the current state and of the actions chosen by each player, and determines 
which zero-sum game the players are facing at each time period. Hence, the actions of the players have an 
influence both on the payoff today and on the law of the state of nature tomorrow. 

There are several ways of evaluating the payoff of such a stochastic game. For any integer n <E IN, one 
defines the n— stage game for which Player 1 (resp. Player 2) maximizes (resp. minimizes) his average 
gain on the first n stages. For any A €]0, 1], one defines the A-discounted gamqj for which Player 1 (resp. 
Player 2) maximizes (resp. minimizes) his A— discounted payoff. Some of the main questions in the theory 
of zero-sum stochastic games are related to the asymptotic behavior of the values of these games as players 
grow more and more patient: 

• Does the value of the ti— stage game converge as n tends to infinity ? 

• Does the value of the A— discounted game converge as A tends to ? 

• Are the two limits equal ? 

When the answers to these three questions are positive, the game is said to have an asymptotic value. A 
nice explanation of why the asymptotic value should exist for games regular enough is the following [5S] . 
An n— stage game can be seen as a game played in the time interval [0, 1], where the payoff is gt, and 
in which the players only moves at time — . Similarly, in a A-discounted game, they only play at time A, 
A -I- A(l — A), and so on. As n goes to infinity and A goes to 0, these games can thus be viewed as some 
time discretizations of an hypothetic game played in continuous time on [0, 1], and thus the values should 
converge to the value of this "limit game". 

Stochastic games were first studied in the case of a finite number of states and when each player has 
only finitely many actions. Existence and characterization of the values for a fixed A or n is due to Shapley 
[26j and relies on von Neumann's minmax theorem |17j as well as Banach's fixed point theorem. In this 
framework, asymptotic value was established first for recursive and absorbing games |12j . then in 
general (see [SJ [S] for the original proof using Tarski-Scidenberg's Theorem, or |19) for a recent proof 
involving linear programming). 

Since minmax theorems also hold true for games with compact action sets and continuous payoffs |27| . 
the values exist |14j for fixed n or A for games with finitely many states, compact action sets for each 
player, and continuous payoff and transitions. In this framework, asymptotic value was established for 
recursive [IHIIST] and absorbing [211 [HI] games , and was conjectured to hold true in general [55]. 

Let us mention that the existence of an asymptotic value was established in the framework of Markov 
decision processes and dynamic programming [5J [3J [JJ 1101 121] ; for games with incomplete information 
PD [m [Ml ; as well as for some stochastic games with incomplete information [50] [221 [221 [2S] • 

In this paper we answer by the negative to the conjecture in [23 by constructing a game with four 
states, compact action sets, and continuous payoff and transitions, whose values do not converge as n 



This research was supported by grant ANR-IO-BLAN 0112 (France). 
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tends to infinity or A tends to 0. Surprisingly, it is possible to construct a compact game in which Player 
1 can guarantee a payoff of 1 in any 10^'^— stage game, while Player 2 can guarantee a payoff of —1 in 
any 10^'"'"*'^— stage game. The idea of the counterexample is to construct transition functions that are 
continuous but oscillate infinitely often. These oscillations of the transition functions yield oscillations - 
and thus divergence - of the values. 

The paper is structured as follows. The first section gives the model of compact stochastic games 
and define discounted and finitely repeated values. The next section is the main one in which some 
counterexamples are constructed: first we give some examples in which the discounted value diverges, then 
we show that the value of the n— stage game diverges as well for some of these examples. The last section 
gives some concluding remarks as well as some open questions. 

2. Model 

A compact two person zero-sum stochastic game F is defined by a finite state space fi, compact metric 
action spaces / and J for Player 1 and 2 (we will denote the mixed actions sets of Player 1 and Player 2 
X = A(/) and Y = A (J), respectiveljQ ), a jointly continuous real bounded payoff g on I x J x and a 
jointly continuous transition p from I x J x fl to A{^1). When / and J are finite the game is said to be 
finite. 

The game is played in discrete time. The initial state wi G 51 is known by both players. At stage t, 
given the state uJt, the players independently choose mixed moves Xt & X and yt E Y. The stage actions 
it and jt are drawn according to xt and yt respectively. The stage payoff is gt — g{it, jt,^t), the new state 
Wt+i is selected according to p{it, jt,uJt), and (it, jt,'-^t+i) is announced to the players. 

We are mainly interested in discounted games: for any discount factor A g]0, 1], the A-discounted game 
with initial state uji is denoted T\{llIi)\ in this game Player 1 (resp. Player 2) maximizes (resp. minimizes) 
the expectation of X^t^i ~ ^Y~^9t- The game r>,(a;i) has a value denoted by v\{uii), and one proves 
(see [in] in the finite case and [H] in the compact one) that the function : il ^ R is the only fixed 
point of the following equation: 

(1) f{uj) = minmax{A.g(.T,y,a;) + (1 - A)Ep(^^,y^„)/(-)} 

(2) = mjixniin{A.g(a::,y,a;) + (1 - A)Ep(^^y^^)/(-)} , 

where g and p are bilinearly extended to A" x y, and the permutation of min and max is possible according 
to Sion's theorem [27]. 

The following lemma gives an interesting sufficient condition for a function to be equal to v\. 

Definition 1. A mixed action x & X (resp. y ^Y) is equalizing for the function / in rA(cj) if for every 
y ^Y (resp. every x G X), 

f{uj) = Xg{x,y,uj) + (1 - X)^p{x.yM)f{-)- 

Lemma 2. Let A €]0, 1] and assume that there exists a function f such that for any state uj, both players 
have an equalizing action in Tx{i^)- Then f ^ v\. 

Proof. Such an / is a fixed point of ([T]) and ([2]), and v\ is the unique fixed point of these equations. □ 

The finitely repeated stochastic game with horizon n and initial state loi is the game in which Player 1 
(resp. Player 2) maximizes (resp. minimizes) the expectation of X^tLi value is denoted by w„(a;i). 

A compact stochastic game is said to have an asymptotic value if v\ and f„ converge (as A goes to 
and n to infinity respectively) and if the limits are the same. 

In the next section we construct a compact stochastic game such that neither v\ nor w„ converges. 
Hence there exists a compact stochastic game with no asymptotic value. 

3. Main section 

The main result of the paper is: 

Theorem 3. There exists a stochastic game with 4 states, in which the action sets are real intervals, the 
payoff and transition functions are continuous, and for which neither vx nor u„ does converge. 

The remainder of this section is dedicated to the proof of this theorem. 



For a compact metric space K, A(_fC) denotes the set of Borel probabilities on K, endowed witli the weak-it topology. 
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3.1. The intuition behind the construction. Before going to the exphcit construction of a counterex- 
ample we give some intuition about it and exhibit a subclass of compact games that is likely to contain a 
counterexample (if such a counterexample exists) . We would like for this class to be as small as possible, 
in order to be more likely to find a precise counterexample within. 

First, recall that a compact absorbing game has an asymptotic value [24], so in any counterexample 
there must be at least two nonabsorbing states, and we consider the simplest case in which there are exactly 
two. Since in any compact stochastic game wa(') converge for at least two initial starting states [T51 118] . 
there must be at least four states. To make things simpler we may as well assume that the states for which 
v\{-) converges are absorbing, with different payoffs (else our game would be equivalent to a three states 
game), say —1 and 1. 

Wc also remark that, in compact games, it is the transitions functions, rather than the payoff functions, 
that are most likely be a source of oscillations of the values vx. A small variation of g induces a small 
variation of v\; it is not the case for small variations of p. So, once again to simplify as much as possible, 
we assume that the payoff does not depend on the actions played by the player. Since compact recursive 
games have an asymptotic value the payoff in the two nonabsorbing states must be different, say 1 
and —1. 

It remains to understand which transition functions are likely to be problematic. First of all, we argue 
that under optimal play in Fx, the absorption probability in each stage should be of the order of A. Indeed, 
if it was much smaller than A, then absorption would happen when the game has almost ended (that is, 
when the remaining part of the discounted payoff is negligible), so the absorbing states would be irrelevant 
and wc might as well remove them. This would give us less than four states and thus an asymptotic value. 
On the other hand, if it was much greater than A, absorption would occur almost immediately and the 
same play would give the same payoff for all small A. 

Similarly, we claim that the order of transition from one nonabsorbing state to the other should be on 
the order of A", for some a in ]0, 1[: if smaller it would almost never happen before absorption ; and if 
higher it would happen so often that the two states would be essentially the same, leaving us with a three 
states game and an asymptotic value. 

Before considering compact games, we are first going to briefly study some finite games having all these 
features, to understand why vx converge in the finite case and might not in the compact one. In fact, it 
turns out that such a game was already studiecJl by Bewley and Kohlberg ([7] page 120). We make the 
following slight gcneralizatiorQ: consider the following family of finite stochastic games, where p*^_ and p*_ 
are two parameters in [0, 1]. 

• There arc two nonabsorbing states aj+ and and two absorbing states 1* and —1*. 

• Both players have two pures actions. Stay and Quit. 

• The payoff in each state is independent of the actions: it is 1 in cj+ and 1* ; —1 in lu^ and —1*. 

• The transitions are given by the following matrices: 





Stay 


Quit 




Stay 


Quit 


Stay 




Ul- 


Stay 






Quit 




p+r + {i-Pl)Lj+ 


Quit 




P--1* + {l-p*_)uj^ 



Calculations show that: 
• limvx ~ V with ^(1^+) 



/Pl + 



• Optimal mixed actions in F^ are given, for k G {+,—}, by xx{i-ui;) 
(we identify a mixed action with the probability assigned to Q). 



\/pi 



as A goes to 



Recall that in any one-shot zero-sum game, if an optimal action of a player is completely mixed, any 
optimal action of the other player is equalizing. Thus, since both xx and yx are completely mixed, they 
are both equalizing in F^. 

Taking the mixed extension of this finite game we get a compact game F"^. The (now pure) action 
Xx and yx arc optimal in F^. Since we want to discuss the infiuence of the parameters of the game on 
the transitions under optimal play, it is convenient to relabel the actions so that the optimal action of a 



■^Interestingly, this game was, at the time, a potential example of a finite game with no uniform value. In their example the 
payofi^ does depend on the chosen actions but this is irrelevant as it won't change the asymptotics of the optimal play. 



■^Their example is the particular case of = pi = 1. 



4 



GUILLAUME VIGERAL 



player in Tx depends only on A and not on and pi. For any nonabsorbing w, it can be shown that 
xx{u!k) = yxiujk) is decreasing for A small enough. Hence, by some suitable change of variables for the 
actions of each player in each state we get a compact game such that the stationary strategy A in each 
state is optimal (and equalizing) for each player. We have thus constructed a compact game such that: 

• There are two nonabsorbing states uj^ and and two absorbing states 1* and —1*. 

• The set of actions of each player is [0, 1]. 

• In each state, for each player, the pure action A is equalizing in T\ for A small enough. 



v(a;+) = v{uj-) 



While these games are compact games, there are very specific ones since they are (up to a change 
of variables) mixed extensions of finite games. In particular the transitions functions arc linear (up to 
a change of variables), and this is what entails the convergence of vx. A natural idea is to use the 
additional freedom in general compact games with interval action sets to construct a similar game such 
that p(w_|i, 7, ) = f+f.^ (where is no longer a constant but a function of i and j), and similar 

formulas for the other transitions. If ^ is slowly oscillating between two positive constants (which could 

not happen, by linearity, in the finite case), we expect that the value vx also oscillates and thus does not 
converge. 

Because of this discussion, in the following we will only consider compact games played in pure (and 
not mixed) actions. This is very convenient since it yields easier computations. Of course in general there 
is no reason for the values u„ and vx to exist for a game played in pure actions; however in the following 
we show how to construct a game for which the values exist but do not converge. 

3.2. A class of compact games. As the last section motivates us to do, let us consider the class Q of 
compact stochastic games satisfying the following properties: 

a) There are two nonabsorbing states uj+ and to-, and two absorbing states 1* and —1*. 

b) The action set of each player (denoted by / and J respectively) is the intervafl [O, j^] ■ 

c) The payoff depends only of the state: for all actions i and j, g{i,j,u!+) — g{i,j,l*) — 1 and 

gihj,^-) = 3(i,j, ~1*) = -1- 

d) The transition probability p is (jointly) continuous, and for all actions i and j, p(— j, tj+) = 
p(l*|z,j,w_) =0. 

e) In each nonabsorbing state and for each player, the pure action A is equalizing in the discounted 
game Tx- That is, for each A G ]0, j^], and for each i £ I and j G J, the discounted value vx 
satisfies 

(3) vx{io+) = X + il-X)[p*+{X,j)+p+iX,j)vx{uj.) + {l-p*+{X,j)-p+{\,j))vx{uj+)] 

(4) vxiuj+) = X + {l-X)[p*+{i,X)+p+{i,X)vxiuj-) + {l-p*+it,X)-p+{i,X))vx{Lu+)] 

(5) vxiu;-) = -X+{l-X)[-p*_iX,j)+p^iX,j)vx{u;+) + il-p*_{X,j)-p^iX,j))vxiu;-)] 

(6) vxicj-) = -X+{l-X)[~p*_{i,X)+p^{i,X)vx{uj+) + il-p*_ii,X)-P-{iA))vx{oJ-)]. 
We remark that to define a game in Q one only need to specify the four functions 



P+ii,j) 
P-ihj) 



p{u^\i,j,u+) 
p{-^*\hj,^~) 
Pi^+l-iJii^-) 

since necessarily p{uj+\i,j,uj+) = 1 -p'+ihj) -p+ihj) and p(w_|i,j,w_) = f -p*_{i,j) -p^{i,j). 

Also we observe that equations ([3]) to (jS]) are characterizations of vx'. any function wx '■ {w_|_,tLi_} — > R 
satisfying the same system must be the discounted value of the game according to Lemma [21 Also remark 
that it implies that the discounted games have a value in pure strategies. 

Wc first establish Theorem [3] for discounted values: 

Theorem 4. There exists a game in Q such that vx does not converge as X goes to 0. 



^Fov reasons that will become clear later (division by 1 — A) it is better not to take / = [0, 1] but a smaller intorvall. 
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The idea of the construction of such an example is to think of the family {i^A};^gjQ ij as a parameter 
of the game, and of the transition functions as unknowns, rather than the opposite. The construction is 
done in three steps: first, for any family vx we identify good candidates p'^, p*_^ p+, p- that may lead to 
value v\ in T\. These candidate functions are in general neither in [0, 1] nor continuous ; but in a second 
step we show that when it is the case they indeed define a game in Q with value v\. Finally, we find a 
family v\ that does not converge as A goes to 0, but such that the constructed candidates p^, pi, p_ 
have the required regularity. 

So let us fix a family v\ and try to find suitable functions p*^, p*_^ and p-. By simplifying a bit 
equations ([3]) to (O, and replacing A by /i in ([l]) and (O one gets the following system (where A and /x are 
in ]0, j^] while i and j are in [O, jq\ ): 



(7) ^'a(^+) 
(8) 

(9) i>a(w_) 

(10) v,X^^) 



A + (l-A) [p;(A,j)+p+(A,j>A(a;-)] 
A + (l-A)p;(A,j) + (l-A)p+(A,j) 

/i + (1 - i^i)p*^_{i, fi) + (1 - i^i)p+{i,fi) 
-A + (1 - A) hpl(A,j) +p.{X,j)vx{u;+)] 

A + (l-A)p*_(A,j) + (l-A)p_(A,j) 
-1-1+ {1 - I-l) [-p*_{i, + p-{i, n)Vf,{uj+)] 



/i+ (1 - i-L)p*^{i, fi) + (1 - n)p-{i,^i) 
In particular taking J = in ([7|) and i = A in ([5]) one gets, for each couple A, in ] 0, , the system 

- a+(1-A)p;(A,p) + (1-A)p+(A,m) 

/ N _ a'+(i-m)[p+(-'^,m)+p+(>',m)'"m("-)] 

- a'+(i-a')p;(a,m)+(i-a')p+(a,m) ■ 

It is convenient to denote s(A) = "^("+)+"^("-) g^^L^i d{X) ~ ^ gQ ^^le system becomes 

f (1 - A)(s(A) + d{\) - 1K(A, + 2(1 - A)rf(A)p+(A, fi) = A(l - s(A) - d{\)) 

\(i - m)(s(m) + ^^(m) - 1K(A,A^) + 2(1 - ^i)d{^I)p+{x,^l) = /i(i - s(fi) - dOi)) ' 

When A 7^ /i the unique solution (assuming for a moment that the system is not degenerate) is given by 

{X-f,){l-siX)-d{X)){l-sip)-d{p)) 

(11) p+(a,m) - 



2(1 - A)(l - A*)[rf(A)(l - s(Ai)) - d{pi){l - .s{X))] 
A(l - /i)rf(A*)(l - s(A) - d(A)) - ^(1 - A)rf(A)(l - siij) - rf(M)) 



^^^^ ^ (l-A)(l-M)KA)(l-s(M))-rf(M)(l-5(A))] 

Similarly, considering equations © and ([TU]) yields, for A ^ in ] 0, j^] 

(A-M)(l + 5(A)-rf(A))(l + .s(Ai)-d(M)) 

(13) P-{X,^i) - 



2(1 - A)(l - m)KA)(1 + s(m)) - d{^i)il + six))] 
A(l - ^l)difl){l + s{X) ~ d{X)) - Ai(l - A)d(A)(l + s(/i) - rf(^)) 



^^"^^ ^-^^'^^ ■ (l-A)(l-Ai)[d(A)(l + s(/i))-d(/.)(l + s(A))] 

In general there is no guarantee that the functions defined by equations (fTT|) to (jl4p will be positive, 
continuously extendable, or even well defined. However we now show that when they are, they define a 
game in Q. 

Definition 5. A pair (s, d) of continuous functions from ]0, jg] to 11 is feasible if there exists a game in Q 
such that 

vx{uj+) = s(A)+d(A) 
t;A(^-) = s(A)-d(A). 

Lemma 6. Assume that for X ^ ji in ]0, j^], t/ie quantities defined in equations Ul]) to ^14\ ) are well 

defined, with value in [O, ^] . Also assume that the four functions can be continuously extended to [O, j^]^. 
Then {s,d) is feasible. 
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Proof. Let F be the stochastic game satisfying assumptions a) to d) and with transitions functions defined 
by equations pT|) to (fHl) (and their continuous extensions), by p{uj+\i, j,uj+) = l—p'^{i,j)—p+{i,j) £ [0, 1] 
and by w_) = 1 — p'^{i,j) — p-{i,j) G [0, 1]. It remains to show that assumption e) is satisfied, 

with v\{lu^) = s{X) + d{X) and vx{uj-) = s(X) — d(X). By construction, for every discount factor A equations 



3.3. Construction of a specific counterexample. To establish Theorem!?] it is thus enough to find a 
couple (s, d) such that the assumptions of Lemma [S] are satisfied but s(A) ± d{X) does not converge as A 
goes to 0. We first give an intuition leading to our choice of specific d and s. 

Let (s, d) be any feasible couple. Then, for the values not to converge, it is necessary that d{X) tends 
slowly to as A goes to 0, for the following reasons. 

• Let vi and V2 be any two accumulation points of vx such that maxij{ui(w) — W2(w)} > 0. Define 
fti = Argmax^{z;i(aj) — W2(w)} and fl2 — Argmax^gjj^{vi(w)}. Reasoning as in [3T] yields to a 
contradiction as soon as is a singleton ; this implies that d{X) goes to as A goes to 0. 

• Assume for example that d{X) ~ for A small enough. Then v\{uj+) = v\{uj-), hence the values 
won't change if we replace any transition from w"*" to uj~ by a transition from a;"*" to uj^ ; and 
any transition from uj^ to lo^ by a transition from lo^ to lu^ . But the resulting game is just two 
absorbing games played in parallel, and absorbing games have an asymptotic value, a contradiction. 
If d{X) — o(A), the values won't change "much" in the auxiliary game, and the contradiction is the 
same. 

Denote by the function x — )• y^. Because of the reasons stated above, in this section we fix d = 
Since the payoff function is bounded, it is easy to see that if (s, is feasible and s is continuously 
diffcrentiable, then s and As'(A) are bounded. We now prove a reciprocal: 

Proposition 7. Lei s £ C\]0,-^],M). A ssume that s and x xs'{x) are both hounded by jq. Then 
(s, y^) is feasible. 

Proposition |4] is an immediate consequence since there are functions s{x) satisfying the assumptions of 
Proposition [7] but without a limit as x goes to 0. Take for example s{x) = g" ^ ■ 
We start by a technical lemma. 

Lemma 8. Let s E ^^(JO, j^],]R). Assume that s and x — >■ xs'{x) are both bounded by C. Then the two 
functions defined on ]0,j^[^ to R by 



are jointly continuous and bounded by 3C. 

We stress out that we do not need s to have a limit, as x goes to 0, for this lemma to hold (and, in fact, 
this is precisely what will allow us to construct our counterexample). 





and 




Proof For x ^ y, fi{x,y) = {^/x + y/y) 
continuous. Moreover, for y < x, 




, hence the mean value theorem ensures that /i is 




3C. 



y 



'We denote C^{A, B) the sot of continuously diflferentiable functions from A to B. 
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£(£) _£M 
V 

-V 



For X ^ y, f2{x,y) ~ {y/x + ^/y)^/xy , hence the mean value theorem ensures that /2 is 

continuous. Moreover, for y < x, 



\h{x,y)\ < 



< 








dz 


^/x- y/y . 


'y 


m 




y/xy 


f 


s'{z) 


siz) 




Vx- Vy . 


'y 




< 


Vxy 


f 


3Cdz 




Vx- y/y . 


ly 


2zv^ 





dz 



3C. 



□ 



For d = ^ we remark that the quantities defined in ([TT|) to can be rewritten as, iox \ ^ [i in 



(15) 

(16) 
(17) 

(18) 



P+(A,Ai) 

P+(A,Ai) 
p_(A,pt) 



(VA + V7Z)(1 - VA - ,s(A))(l - ^~s{y.)) 
2(l-A)(l-/i)(l + /2(A,/.)) 

(1 - VA)(1 - VT^) - /i(A,/i) + VV/2(A, /i)' 



(l-A)(l-/i)(l + /2(A,/.)) 
(VA + Vm)(1 - VA + ,s(A))(l -^ + s{ii) 
2(1-A)(l-Ai)(l-/2(A,M)) 
(1 - VA)(1 - V7^) + /i(A,Ai) - ^A7^/2(A,/x) 



(1-A)(1-M)(1-/2(A,A^)) 

The four following lemmas establish that the regularity conditions in Lemma [5] are satisfied under the 
assumptions of Proposition [T] 

Lemma 9. Let s G C^QO, -^],'R). A ssume that s and x — xs'{x) are both hounded by j^. Then the 
function defined on [O, jg] ^ by 

{ Vxy[(l-^/x){l~^}-fl{x,y) + ^/xyf2{x,y)] 
il-x){l-y){l+h{x,y)) lJXy>U 
ifxy = 

is well defined, jointly continuous, with value in [O, ^] . 

Proof. Lemma [5] implies that the denominator is positive when xy > 0, hence p*^ is well defined. The same 
lemma also implies that p'^ is jointly continuous on ]0, jq]'^ ■ Finally, the bounds on /i and /2 and the fact 
that X and y are less than imply that 

_9/16 - 3/16 - 3/256 , ^1 + 3/16 + 3/256 4912 ^ 1 
T777T7 ^ y) ^ V^TTrTT^TTTTTSTTH ^^T^^V^ < o 

□ 



19/16 -'^+' '«/-v « (15/16)2 X 13/16 2925"""" ^2 

hence is also jointly continuous at any (x, y) with xy = 0, and takes its value in [O, i] . 



Lemma 10. Let s e C^QO, -^],R). A ssume that s and x — >■ xs' {x) are both bounded by j^. Then the 
function defined on [O, ^j^^] ^ by 

' V^[(l-V^)(l-\/j7)+/l(a;,j;)-VSj7/2(a:,!;)] .„ „ 

(\-x){\~y){i-h{x,y)) ^jxy >yj 

^0 if xy ^ 

is well defined, jointly continuous, with value in [O, ^] . 

Proof. Same as the previous proof, replacing s by its opposite. □ 



P-ix,y) 
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Lemma 11. Let s s C^QO, j^], R). A 
function defined on [O, jg] ^ by 



that s and x — >■ xs'{x) are both hounded by . Then the 



2il-x){l-y){l+f2{x,y)) 
V^(l — y-z — s(3:)) 

p+(x,y) = < 

2(i-y) 





if xy > 

if x > and y ^ 
if y > and x ^ 
if x = y^Q 



is well defined, jointly continuous, with value in [O, ^] . 

Proof. Lemma [8] implies that the denominator is positive when xy > 0, hence p+ is well defined. The same 
lemma also implies that is jointly continuous on ]0, jq]'^ ■ 

Remarking that x ^ y , 1 + f2{x, y) = ^i^ (v^(l - ^/y - s{y)) - ^(1 - y/x - s{x))) one gets 



P+{x,y) = 



x-y 



2(l-x)(l-y) 



_l-^/^-s(x) l-^-s{y) 

hence the joint continuity of p+ at any point where xy = and x + y > 0. 

Finally, the bounds on /i and /2 and the fact that x and y are less than imply that 

(17/16)2 



2 X 19/16 

hence p+ is also jointly continuous at (0,0), and takes its value in [O, ^] 



2 X (15/16)2 X 13/16 " 2925^^"^ 2 



□ 



Lemma 12. Let s G Ci(]0, j^], K,). A 
function defined on [O, jq\ ^ by 



that s and x — >■ xs'{x) are both bounded by t^. Then the 



P-{x,y) 



{V^+Vv)(i-V^+Hx))('i--Vv+s{y)) 

2(l-x)(l-y)(l-/2(x,y)) 
2{l~x) 

2(i-a) 





ifxy>0 

if X > and y ^ 
if y > and a; = 
ifx^y^O 



is well defined, jointly continuous, with value in [O, ^] . 

Proof. Same as the previous proof, replacing the function s by its opposite. 



□ 



Proof of Proposition It is an immediate consequence of Lemma IHl since by the four preceding lemmas 
the functions defined by equations ([TT]) to have all the required properties. □ 

3.4. The case of finitely repeated stociiastic game. In this section we construct examples where Vn 
does not converge as n goes to infinity. The idea is to construct an example in which vx does not converge 
and such that the sequence v„ has the same asymptotic behavior as v\. The following lemma is a slight 
variation of a result of Neyman [TH] . 



Lemma 13. Let T be any stochastic game. Assume that v\ is of class , and that for all lo, 
Then v„ and v\ have the same accumulation points. 

Proof. Denote Wn = v\ for A = — . By assumptions and the mean value theorem. 



sup 

r 1 i_ 



Hence v\ and «;„ have the same accumulation points. 
By a argument due to Neyman(Theorem 4 in jl8)). 



n — 1 
n ^ — ^ 



i=l 
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So we only need to observe that, by the mean value theorem, 



* + >.e[l/iS/i+l] 
= 0(1). 



dvx 



dA 



□ 



We can now prove Theorem |31 



Proof of Theorem\^ LeiQ s{x) — '"ig a-nd d(x) ~ ^Jx. Since s'{x) = — '^°\^g^ , xs'{x) is 
bounded by q^^\(^2) ^ Te' ^^^'^ S°^^ to as a; goes to 0. By Proposition [3 (s,d) is feasible but vx 
does not converge as A goes to 0. By Lemma [Ql i;„ docs not converge as n goes to infinity. □ 

Remark 14. In fact, we could prove, exactly in the same way, that for the example constructed in the last 
proof, any admissible sequence z„ (as defined in [30| ) also diverges as n goes to infinity. 

4. Concluding remarks and open problems 

We first point out that these examples are minimal in several aspects: 

• There are only two nonabsorbing states. Compact games with only one nonabsorbing state (called 
absorbing games) have an asymptotic [U value. 

• In nonabsorbing states, the payoff does not depend on actions. If the payoff also did not depend on 
the current nonabsorbing state, the game would be a compact recursive game and would have an 
asymptotic value [53] ■ 

• There are exactly two initial states lj (the two absorbing states) such that Vn{uj) converges. For every 
compact stochastic game, there are at least two initial states such that w„(a;) converges [151 [T5] . 

• Also remark that the action sets are not general compact sets but rather real intervals, and that all 
discounted games Fa have values in pure strategies. 

At the beginning of Section [3.31 we gave necessary conditions for {s,d) to be feasible. In fact, quite 
surprisingly, it turns out that those necessary conditions are almost sufficient. Explicitly, one can prove, 
using the techniques presented in Section 13.31 that: 

Proposition 15. Let s and d be two continuously differentiable functions from ]0, j^] to R. Assume that 

• s and As' (A) are bounded. 

• d is nonnegative and there exists e > such tha^ for all A e]0, 1], 

Ad' (A) 

Then {As, Bd) is feasible for any nonnegative constants A and B small enough. 

Let us now briefly discuss the regularity of the transitions functions in our counterexamples. One may 
remark that while these functions are constructed to be continuous, they are not continuously differentiable 
in 0, nor even Lipschitz-continuous. However, we affirm that this lack of regularity is not at all the reason 
of the divergence of vx- Indeed, for any nonnegative r, replacing x and y by x'^ and y^ in the definition of 
the transition functions will not change the values (it is just a relabeling of actions) while it will regularize 
the transition functions. To say it another way, we only considered games where the pure action A is 
optimal in F^, but this is just to make calculations easier, and relaxing this assumption we can construct 
transitions as regular as one wants. 

Rather than the regularity of the transition functions, we argue that the issue here is their infinite 
number of oscillations. Recall that in our counterexample, p+{i,j) is of the order of ^/J, and p^(i, j) 
is of the order of y/Tj. Thus in F^, starting from a;+, Player 2 should play neither a high j (otherwise 
Player 1 may absorb with payoff 1 and high probability) nor a low j (otherwise Player 1 may stay in a;_|- 
with payoff 1 until the game is essentially finished). Hence he should play the intermediate action j = A, 
and the same thing is true in a;_ and for Player 1. So, under optimal play, the order of magnitude of 
the time between two transitions from a nonabsorbing state to the other is A^a. Hence, after the first 



The function ^"^ used previously would not work here since its derivative is not a o{l/x). 
^In particular, any function A" for a £]0, 1[ satisfies this condition. 
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A 3 stages (during which the accumulated discounted payoff has been negligible), there still has been 
no absorption with probability almost 1, and the occupation measure has almost reached the invariant 
measure ^ (x'\)+p^{\ a) ' ^+ + p (\\)+p+{x \) ' ■ have thus established that the discounted payoff 
given nonabsorption is approximately (a'a)+p+(a'a) ' similarly one sees that the expected payoff after 
absorption is function of the relative importance of , p_ , and pi . Oscillations of the ratios between 
these quantities thus imply oscillations of the discounted payoff under optimal play, which is Vx. For 
a compact game that is the mixed extension (up to relabelling of actions) of some finite games, these 
quantities cannot oscillate infinitely often, and we think this is the reason why the values converge in 
the finite framework. Since semi-algebraic functions cannot oscillate infinitely often, the following natural 
question is rather intriguing: 

(i) Let r be a compact game with semi- algebraic payoff and transition functions. Is there an asymptotic 
value ? 

When there is only one player, while there may be no 0-optimal play in the infinite game [5], the 
asymptotic value always exists for compact games [H [TU] . In fact it exists with no hypotheses at all on the 
action set as long as the number of states is finite [51]. When there are two players, the asymptotic value 
exists for games with finitely many actions for each player [5] , but we showed that asymptotic value may 
not exist for compact games. This leads to this question in an intermediate setting: 

(ii) Let F be a compact game in which Player 1 has a finite number of actions. Is there an asymptotic 
value ? 

Finally, the asymptotic value exists for compact games in which Player 2 has no infiuence on the 
transition, and in fact even in a more general setting in which Player 2 is also not perfectly informed of the 
state [22j . In our construction it is important that the transitions are jointly controlled. From lj^ Player 
1 cannot ensure to go to 1* with positive probability, while Player 2 can force a transition to w_ with high 
probability. However Player 2 cannot at the same time prevent any transition to 1* and ensure a positive 
probability to go to a;_. Hence one may wonder: 

(iii) Let F be a compact game in which each state is controlled by one player (but different states may 
be controlled by different players). Is there an asymptotic value ? 

The answer to these three questions is not know in general, however the particular case of semi algebraic 
games which also satisfy either condition (ii) or (iii) is settled (with positive answer) in [8]. 

A last remark is that there is a huge gap between compact games with one and two nonabsorbing states. 
We just showed that there is no asymptotic value for games with two nonabsorbing states ; while for one 
nonabsorbing states the stronger notion of uniform value (when the payoffs are observed) also holds |16j . 
In fact it does not seem easy to construct a compact game with an asymptotic but no uniform value (when 
the payoffs are observed). 
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